Customer Churn Analysis

Churn rate, when applied to a customer base, refers to the proportion of contractual customers or subscribers who leave a supplier during a given time period. It is a possible indicator of customer dissatisfaction, cheaper and/or better offers from the competition, more successful sales and/or marketing by the competition, or reasons having to do with the customer life cycle.

Churn is closely related to the concept of average customer life time. For example, an annual churn rate of 25 percent implies an average customer life of four years. An annual churn rate of 33 percent implies an average customer life of three years. The churn rate can be minimized by creating barriers which discourage customers to change suppliers (contractual binding periods, use of proprietary technology, value-added services, unique business models, etc.), or through retention activities such as loyalty programs. It is possible to overstate the churn rate, as when a consumer drops the service but then restarts it within the same year. Thus, a clear distinction needs to be made between "gross churn", the total number of absolute disconnections, and "net churn", the overall loss of subscribers or members. The difference between the two measures is the number of new subscribers or members that have joined during the same period. Suppliers may find that if they offer a loss-leader "introductory special", it can lead to a higher churn rate and subscriber abuse, as some subscribers will sign on, let the service lapse, then sign on again to take continuous advantage of current specials. https://en.wikipedia.org/wiki/Churn_rate



In [2]:

    
%%capture

# Get our favorite packages from PyPI
! pip install plotly
! pip install cufflinks
! pip install h2o



# Import pre-installed packages
import numpy as np
import pandas as pd



# Suppress unwatned warnings
import warnings
warnings.filterwarnings('ignore')
import logging
logging.getLogger("requests").setLevel(logging.WARNING)



In [3]:

    
# Load our favorite visualization library

import plotly
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
import cufflinks as cf
plotly.offline.init_notebook_mode(connected=True)

# Sign into Plotly with masked, encrypted API key

#myPlotlyKey = os.environ['SECRET_ENV_BRETTS_PLOTLY_KEY']
py.sign_in(username='bretto777',api_key='conh5EnFad0Z9Lz6mVWr')



In [4]:

    
# Load some data
churnDF = pd.read_csv('https://s3-us-west-1.amazonaws.com/dsclouddata/home/jupyter/churn_train.csv', delimiter=',')
churnDF["Churn"] = churnDF["Churn"].replace(to_replace=False, value='Retain')
churnDF["Churn"] = churnDF["Churn"].replace(to_replace=True, value='Churn')
churnDFs = churnDF.sample(frac=0.07) # Sample for speedy viz
churnDF.head(10)









    Out[4]:






  
    
      
      State
      Account Length
      Area Code
      Phone
      Int'l Plan
      VMail Plan
      VMail Message
      Day Mins
      Day Calls
      Day Charge
      ...
      Eve Calls
      Eve Charge
      Night Mins
      Night Calls
      Night Charge
      Intl Mins
      Intl Calls
      Intl Charge
      CustServ Calls
      Churn
    
  
  
    
      0
      ND
      84
      415
      400-7253
      no
      yes
      33
      159.1
      106
      27.05
      ...
      101
      12.73
      213.4
      108
      9.60
      13.0
      18
      3.51
      1
      Retain
    
    
      1
      RI
      117
      408
      370-5042
      no
      yes
      13
      207.6
      65
      35.29
      ...
      77
      12.98
      232.8
      95
      10.48
      9.7
      3
      2.62
      1
      Retain
    
    
      2
      VA
      132
      510
      343-4696
      no
      no
      0
      81.1
      86
      13.79
      ...
      72
      20.84
      237.0
      115
      10.67
      10.3
      2
      2.78
      0
      Retain
    
    
      3
      OK
      121
      408
      364-2495
      no
      yes
      31
      237.1
      63
      40.31
      ...
      117
      17.48
      196.7
      85
      8.85
      10.1
      5
      2.73
      4
      Retain
    
    
      4
      ME
      205
      510
      413-4039
      no
      yes
      24
      175.8
      139
      29.89
      ...
      98
      13.18
      180.7
      64
      8.13
      7.8
      5
      2.11
      2
      Retain
    
    
      5
      MT
      116
      415
      384-5907
      no
      yes
      35
      182.8
      122
      31.08
      ...
      119
      18.08
      193.8
      103
      8.72
      11.0
      2
      2.97
      1
      Retain
    
    
      6
      MN
      113
      408
      417-5146
      no
      no
      0
      158.9
      137
      27.01
      ...
      109
      20.64
      247.8
      97
      11.15
      6.5
      4
      1.76
      0
      Retain
    
    
      7
      MO
      74
      415
      421-2955
      no
      no
      0
      172.1
      105
      29.26
      ...
      99
      17.99
      182.2
      105
      8.20
      11.6
      6
      3.13
      1
      Retain
    
    
      8
      MI
      112
      510
      420-1383
      no
      no
      0
      243.4
      77
      41.38
      ...
      97
      15.48
      259.2
      94
      11.66
      12.8
      2
      3.46
      1
      Retain
    
    
      9
      UT
      68
      415
      403-8916
      no
      no
      0
      213.9
      112
      36.36
      ...
      100
      22.14
      233.8
      97
      10.52
      8.4
      3
      2.27
      1
      Churn
    
  

10 rows × 21 columns



In [5]:

    
# separate the calls data for plotting
churnDFs = churnDFs[['Account Length','Day Calls','Eve Calls','CustServ Calls','Churn']]

# Create scatter plot matrix of call data
splom = ff.create_scatterplotmatrix(churnDFs, diag='histogram', index='Churn',  
                                  colormap= dict(
                                      Churn = '#9CBEF1',
                                      Retain = '#04367F'
                                      ),
                                  colormap_type='cat',
                                  height=560, width=650,
                                  size=4, marker=dict(symbol='circle'))
py.iplot(splom)









    Out[5]:



In [6]:

    
%%capture
import h2o
from h2o.estimators.glm import H2OGeneralizedLinearEstimator
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.estimators.deeplearning import H2ODeepLearningEstimator
from h2o.estimators.stackedensemble import H2OStackedEnsembleEstimator
from h2o.grid.grid_search import H2OGridSearch
from __future__ import print_function
h2o.init()
h2o.remove_all()



In [7]:

    
# Split data into training and testing frames

from sklearn import cross_validation
from sklearn.model_selection import train_test_split

training, testing = train_test_split(churnDF, train_size=0.8, stratify=churnDF["Churn"], random_state=9)
train = h2o.H2OFrame(python_obj=training).drop("State")
test = h2o.H2OFrame(python_obj=testing).drop("State")

# Set predictor and response variables
y = "Churn"
x = train.columns
x.remove(y)









    



/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning:

This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.







    



Parse progress: |█████████████████████████████████████████████████████████| 100%
Parse progress: |█████████████████████████████████████████████████████████| 100%

Super Learner

The super learner is a prediction method designed to find the optimal combination of a collection of prediction algorithms. The super learner algorithm finds the combination of algorithms minimizing the cross-validated risk. The super learner framework is9 built in the theory of cross-validation and allows for a general class of prediction algorithms to be considered for the ensemble. http://biostats.bepress.com/ucbbiostat/paper266/ (Polley & Van der Laan, 2010)



In [8]:

    
# Reset variables
del allModels, gridGBM, gridRF, grids, dfGridGBM, dfGridRF, ensemble









    




NameErrorTraceback (most recent call last)
<ipython-input-8-9dadbaa21fd6> in <module>()
      1 # Reset variables
----> 2 del allModels, gridGBM, gridRF, grids, dfGridGBM, dfGridRF, ensemble

NameError: name 'allModels' is not defined



In [ ]:

    
# Reset variables
del SuperModel, BestModel, Model3, Model4, Model5, Model6, Model7, Model8, Model9, Model10



In [9]:

    
%%time
# Specify GBM hyperparameters and stopping criteria for the grid
nfolds = 5
gbm_hyper_params = {"learn_rate":[0.075, 0.1], "nbins":[10,15,20],"ntrees": [20,30,40], "max_depth": [5,7,9], "sample_rate": [0.75, 0.8, 0.85, 0.9]}
search_criteria = {"strategy": "RandomDiscrete", "max_models": 6}

# Setup the GBM grid search
gridGBM = H2OGridSearch(model=H2OGradientBoostingEstimator(balance_classes=True, seed=123, nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True),
                     hyper_params=gbm_hyper_params,
                     search_criteria=search_criteria,
                     grid_id="gbm_grid_binomial")

# Start the GBM training
gridGBM.train(x=x, y=y, training_frame=train)
         

    
# Setup the Random Forest grid search
rf_hyper_params = {"mtries":[12,15,18],"nbins":[10,20,30], "ntrees": [25,50,75], "max_depth": [5,7], "sample_rate": [0.75, 0.8, 0.85, 0.9]}
gridRF = H2OGridSearch(model=H2ORandomForestEstimator(balance_classes=True, seed=123, nfolds=nfolds, fold_assignment="Modulo", keep_cross_validation_predictions=True),                     
                      hyper_params=rf_hyper_params,
                      search_criteria=search_criteria,
                      grid_id="rf_grid_binomial")

# Start the Random Forest training
gridRF.train(x=x, y=y, training_frame=train)


# List the GBMs and Random Forests that we wish to ensemble
grids = gridGBM.model_ids + gridRF.model_ids


# Train the super learner
ensemble = H2OStackedEnsembleEstimator(model_id="GBM_RF_ensemble_", base_models=grids, training_frame=train, validation_frame=test)
ensemble.train(x=x, y=y, training_frame=train)


# Evaluate ensemble performance on the test data
perf_stack_test = ensemble.model_performance(test)


# Compare the super learner to the base learners. First, combine all the base models into a single list, sorted by auc.
dfGridGBM = pd.DataFrame(data=gridGBM.get_grid(sort_by="auc", decreasing=True).sorted_metric_table())
dfGridRF = pd.DataFrame(data=gridRF.get_grid(sort_by="auc", decreasing=True).sorted_metric_table())
allModels = dfGridGBM.append(dfGridRF)
allModels['auc'] = allModels['auc'].astype('float64')
allModels.sort_values(by="auc", ascending=False, inplace=True)
allModels = allModels.reset_index()
baselearner_best_name = allModels.loc[0,'model_ids']
baselearner_best_auc = allModels.loc[0,'auc']

# Best stacked model auc
stack_auc_test = perf_stack_test.auc()


print("Best Base-learner Test AUC: " + str(baselearner_best_auc))
print("Ensemble Test AUC: " + str(stack_auc_test))









    



gbm Grid Build progress: |█████████████████████████████████████████████████| 100%
drf Grid Build progress: |████████████████████████████████████████████████| 100%
stackedensemble Model Build progress: |███████████████████████████████████| 100%
Best Base-learner Test AUC: 0.909450898025
Ensemble Test AUC: 0.951612024092
CPU times: user 2.26 s, sys: 0 ns, total: 2.26 s
Wall time: 52.2 s



In [10]:

    
allModels









    Out[10]:






  
    
      
      index
      
      auc
      learn_rate
      max_depth
      model_ids
      mtries
      nbins
      ntrees
      sample_rate
    
  
  
    
      0
      0
      
      0.909451
      0.1
      7
      gbm_grid_binomial_model_4
      NaN
      10
      40
      0.8
    
    
      1
      1
      
      0.909101
      0.075
      5
      gbm_grid_binomial_model_3
      NaN
      20
      40
      0.9
    
    
      2
      2
      
      0.907738
      0.1
      9
      gbm_grid_binomial_model_1
      NaN
      10
      40
      0.9
    
    
      3
      3
      
      0.906726
      0.1
      5
      gbm_grid_binomial_model_0
      NaN
      20
      30
      0.8
    
    
      4
      4
      
      0.905211
      0.075
      9
      gbm_grid_binomial_model_2
      NaN
      10
      20
      0.8
    
    
      5
      5
      
      0.904946
      0.075
      7
      gbm_grid_binomial_model_5
      NaN
      20
      30
      0.85
    
    
      6
      0
      
      0.900719
      NaN
      7
      rf_grid_binomial_model_5
      18
      10
      75
      0.75
    
    
      7
      1
      
      0.900073
      NaN
      7
      rf_grid_binomial_model_0
      15
      30
      75
      0.75
    
    
      8
      2
      
      0.898967
      NaN
      7
      rf_grid_binomial_model_2
      12
      30
      50
      0.85
    
    
      9
      3
      
      0.895928
      NaN
      5
      rf_grid_binomial_model_3
      12
      20
      25
      0.8
    
    
      10
      4
      
      0.893725
      NaN
      5
      rf_grid_binomial_model_1
      15
      20
      25
      0.75
    
    
      11
      5
      
      0.888636
      NaN
      5
      rf_grid_binomial_model_4
      15
      30
      25
      0.9

Variable Importances

Below we plot variable importances as reported by the best performing algo in the ensemble.



In [11]:

    
best = h2o.get_model(baselearner_best_name)

importances = best.varimp(use_pandas=True)
importances = importances.loc[:,['variable','relative_importance']].groupby('variable').mean()
importances.sort_values(by="relative_importance", ascending=False).iplot(kind='bar', colors='#5AC4F2', theme='white')









    Out[11]:

Super Model vs the Base models

This plot shows the ROC curves for the Super Model, the Best Base Model, and 9 next best models in the ensemble.



In [12]:

    
SuperModel = np.array(ensemble.roc(valid=True))
BestModel = np.array(h2o.get_model(baselearner_best_name).roc(xval=True))
Model2 = np.array(h2o.get_model(allModels.loc[1,'model_ids']).roc(xval=True))
Model3 = np.array(h2o.get_model(allModels.loc[2,'model_ids']).roc(xval=True))
Model4 = np.array(h2o.get_model(allModels.loc[3,'model_ids']).roc(xval=True))
Model5 = np.array(h2o.get_model(allModels.loc[4,'model_ids']).roc(xval=True))
Model6 = np.array(h2o.get_model(allModels.loc[5,'model_ids']).roc(xval=True))
Model7 = np.array(h2o.get_model(allModels.loc[6,'model_ids']).roc(xval=True))
Model8 = np.array(h2o.get_model(allModels.loc[7,'model_ids']).roc(xval=True))
Model9 = np.array(h2o.get_model(allModels.loc[8,'model_ids']).roc(xval=True))
Model10 = np.array(h2o.get_model(allModels.loc[9,'model_ids']).roc(xval=True))



layout = go.Layout(autosize=False, width=725, height=575,  xaxis=dict(title='False Positive Rate', titlefont=dict(family='Arial, sans-serif', size=15, color='grey')), 
                                                           yaxis=dict(title='True Positive Rate', titlefont=dict(family='Arial, sans-serif', size=15, color='grey')))

SuperModelTrace = go.Scatter(x = SuperModel[0],y = SuperModel[1], mode = 'lines', name = 'Super Model', line = dict(color = ('rgb(26, 58, 126)'), width = 3))
BestModelTrace = go.Scatter(x = BestModel[0],y = BestModel[1], mode = 'lines', name = 'Best Base Model', line = dict(color = ('rgb(135, 160, 216)'), width = 3))
Model2Trace = go.Scatter(x = Model2[0], y = Model2[1], mode = 'lines', name = 'Model 2', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model3Trace = go.Scatter(x = Model3[0], y = Model3[1], mode = 'lines', name = 'Model 3', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model4Trace = go.Scatter(x = Model4[0], y = Model4[1], mode = 'lines', name = 'Model 4', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model5Trace = go.Scatter(x = Model5[0], y = Model5[1], mode = 'lines', name = 'Model 5', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model6Trace = go.Scatter(x = Model6[0], y = Model6[1], mode = 'lines', name = 'Model 6', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model7Trace = go.Scatter(x = Model7[0], y = Model7[1], mode = 'lines', name = 'Model 7', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model8Trace = go.Scatter(x = Model8[0], y = Model8[1], mode = 'lines', name = 'Model 8', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model9Trace = go.Scatter(x = Model9[0], y = Model9[1], mode = 'lines', name = 'Model 9', line = dict(color = ('rgb(156, 190, 241)'), width = 1))
Model10Trace = go.Scatter(x = Model10[0], y = Model10[1], mode = 'lines', name = 'Model 10', line = dict(color = ('rgb(156, 190, 241)'), width = 1))

traceChanceLine = go.Scatter(x = [0,1], y = [0,1], mode = 'lines+markers', name = 'chance', line = dict(color = ('rgb(136, 140, 150)'), width = 4, dash = 'dash'))

fig = go.Figure(data=[SuperModelTrace,BestModelTrace,Model2Trace,Model3Trace,Model4Trace,Model5Trace,Model7Trace,Model8Trace,Model9Trace,Model10Trace,traceChanceLine], layout=layout)


py.iplot(fig)









    Out[12]:

Confusion Matrix



In [13]:

    
cm = perf_stack_test.confusion_matrix()
cm = cm.table.as_data_frame()
cm
confusionMatrix = ff.create_table(cm)
confusionMatrix.layout.height=300
confusionMatrix.layout.width=800
confusionMatrix.layout.font.size=17
py.iplot(confusionMatrix)









    Out[13]:

Business Impact Matrix

Weighting Predictions With a Dollar Value

Correctly predicting false: 5
Correctly predicting true: 75
Incorrectly predicting false: 150
Incorrectly predicting true: 1.5



In [22]:

    
cm.loc[0,'Churn'] * 75









    Out[22]:





4800.0



In [23]:

    
CorrectPredictChurn = cm.loc[0,'Churn']
CorrectPredictChurnImpact = 75
cm1 = CorrectPredictChurn*CorrectPredictChurnImpact

IncorrectPredictChurn = cm.loc[1,'Churn']
IncorrectPredictChurnImpact = -5
cm2 = IncorrectPredictChurn*IncorrectPredictChurnImpact

IncorrectPredictRetain = cm.loc[0,'Retain']
IncorrectPredictRetainImpact = -150
cm3 = IncorrectPredictRetain*IncorrectPredictRetainImpact

CorrectPredictRetain = cm.loc[0,'Retain']
CorrectPredictRetainImpact = 5
cm4 = IncorrectPredictRetain*CorrectPredictRetainImpact


data_matrix = [['Business Impact', '($) Predicted Churn', '($) Predicted Retain', '($) Total'],
               ['($) Actual Churn', cm1, cm3, '' ],
               ['($) Actual Retain', cm2, cm4, ''],
               ['($) Total', cm1+cm2, cm3+cm4, cm1+cm2+cm3+cm4]]

impactMatrix = ff.create_table(data_matrix, height_constant=20, hoverinfo='weight')
impactMatrix.layout.height=300
impactMatrix.layout.width=800
impactMatrix.layout.font.size=17
py.iplot(impactMatrix)









    Out[23]:



In [15]:

    
print("Best learner AUC: " + str(baselearner_best_auc))









    



Best learner AUC: 0.909450898025



In [16]:

    
print("Super Model AUC: " + str(stack_auc_test))









    



Super Model AUC: 0.951612024092



In [17]:

    
print("Total customers evaluated: 534")









    



Total customers evaluated: 534



In [18]:

    
print("Total value created by the model: $" + str(cm1+cm2+cm3+cm4))









    



Total value created by the model: $2305.0



In [19]:

    
print("Total value per customer: $" +str(round(((cm1+cm2+cm3+cm4)/534),3)))









    



Total value per customer: $4.316

Save the Model



In [ ]:

    
# Save the model and upload it to s3
import os
from boto.s3.connection import S3Connection
from boto.s3.key import Key



In [ ]:

    
def upload_file_to_s3(myFile):
    def get_bucket():
        access= os.environ['SECRET_ENV_AWS_ACCESS_KEY_BRETT'] 
        secret= os.environ['SECRET_ENV_AWS_SECRET_KEY_BRETT']
        customer = 'demonstration'
        conn = S3Connection(access,secret)
        b = conn.get_bucket('dsclouddata',validate=False)
        return b

    s3_bucket = get_bucket()
    k = Key(s3_bucket)    
    k.key = myFile
    k.set_contents_from_filename(myFile)
    k.make_public()
    successMessage = "Uploaded %s to S3."%(myFile)    
    return successMessage



In [ ]:

    
myFile = h2o.save_model(ensemble, force=True)
# Upload the file
upload_file_to_s3(myFile)



In [ ]:

    
def pull_file_from_s3(key):
    def get_bucket():            
        access= os.environ['SECRET_ENV_AWS_ACCESS_KEY_BRETT'] 
        secret= os.environ['SECRET_ENV_AWS_SECRET_KEY_BRETT']
        customer = 'demonstration'
        conn = S3Connection(access,secret)
        b = conn.get_bucket('dsclouddata',validate=False)
        return b

    s3_bucket = get_bucket()
    payload = s3_bucket.get_key(key)
    local_file = payload.get_contents_to_filename(key)
    return key



In [ ]:

    
# download the model from s3
downloaded_model = pull_file_from_s3('/home/jupyter/GBM_RF_ensemble_')  

# load the downloaded model into memory
ChurnPredictor = h2o.load_model(path=downloaded_model)



In [ ]:

    
def churn_predict(batchFile):
    # connect to the model scoring service
    h2o.init()

    # load the user-specified file
    newData = h2o.import_file(batchFile)

    # open the downloaded model
    ChurnPredictor = h2o.load_model(path=downloaded_model)  
    
    # evaluate the feature vector using the model
    predictions = ChurnPredictor.predict(newData)
    predictions = newData.cbind(predictions)
    h2o.download_csv(predictions, 'predictions.csv')
    
    upload_file_to_s3('/home/jupyter/predictions.csv')
    return "Saved predictions.csv to your S3 bucket."



In [ ]:

    
churn_predict(batchFile = 'https://s3-us-west-1.amazonaws.com/dsclouddata/home/jupyter/churn_test.csv')



In [ ]:

    
def churn_predict(State,AccountLength,AreaCode,Phone,IntlPlan,VMailPlan,VMailMessage,DayMins,DayCalls,DayCharge,EveMins,EveCalls,EveCharge,NightMins,NightCalls,NightCharge,IntlMins,IntlCalls,IntlCharge,CustServCalls):
    # define a feature vector to evaluate with the model
    newData = pd.DataFrame({'State' : State,
                            'Account Length' : AccountLength,
                            'Area Code' : AreaCode,
                            'Phone' : Phone,
                            'Int\'l Plan' : IntlPlan,
                            'VMail Plan' : VMailPlan,
                            'VMail Message' : VMailMessage,
                            'Day Mins' : DayMins,
                            'Day Calls' : DayCalls,
                            'Day Charge' : DayCharge,
                            'Eve Mins' : EveMins,
                            'Eve Calls' : EveCalls,
                            'Eve Charge' : EveCharge,
                            'Night Mins' : NightMins,
                            'Night Calls' : NightCalls,
                            'Night Charge' : NightCharge,
                            'Intl Mins' :IntlMins,
                            'Intl Calls' : IntlCalls,
                            'Intl Charge' : IntlCharge,
                            'CustServ Calls' : CustServCalls}, index=[0])
    
    # evaluate the feature vector using the model
    predictions = ChurnPredictor.predict(h2o.H2OFrame(newData))
    predictionsOut = h2o.as_list(predictions, use_pandas=False)
    return predictionsOut



In [ ]:

    
State = "KS"
AccountLength = 1 
AreaCode = 213
Phone = "362-1234"
IntlPlan = "no"
VMailPlan = "no"
VMailMessage = 0
DayMins = 0
DayCalls = 2
DayCharge = 20
EveMins = 120
EveCalls = 97
EveCharge = 7
NightMins = 2
NightCalls = 7
NightCharge = 10
IntlMins = 13
IntlCalls = 0
IntlCharge = 3.67
CustServCalls = 2
    

churn_predict(State,AccountLength,AreaCode,Phone,IntlPlan,VMailPlan,VMailMessage,DayMins,DayCalls,DayCharge,EveMins,EveCalls,EveCharge,NightMins,NightCalls,NightCharge,IntlMins,IntlCalls,IntlCharge,CustServCalls)

	State	Account Length	Area Code	Phone	Int'l Plan	VMail Plan	VMail Message	Day Mins	Day Calls	Day Charge	...	Eve Calls	Eve Charge	Night Mins	Night Calls	Night Charge	Intl Mins	Intl Calls	Intl Charge	CustServ Calls	Churn
0	ND	84	415	400-7253	no	yes	33	159.1	106	27.05	...	101	12.73	213.4	108	9.60	13.0	18	3.51	1	Retain
1	RI	117	408	370-5042	no	yes	13	207.6	65	35.29	...	77	12.98	232.8	95	10.48	9.7	3	2.62	1	Retain
2	VA	132	510	343-4696	no	no	0	81.1	86	13.79	...	72	20.84	237.0	115	10.67	10.3	2	2.78	0	Retain
3	OK	121	408	364-2495	no	yes	31	237.1	63	40.31	...	117	17.48	196.7	85	8.85	10.1	5	2.73	4	Retain
4	ME	205	510	413-4039	no	yes	24	175.8	139	29.89	...	98	13.18	180.7	64	8.13	7.8	5	2.11	2	Retain
5	MT	116	415	384-5907	no	yes	35	182.8	122	31.08	...	119	18.08	193.8	103	8.72	11.0	2	2.97	1	Retain
6	MN	113	408	417-5146	no	no	0	158.9	137	27.01	...	109	20.64	247.8	97	11.15	6.5	4	1.76	0	Retain
7	MO	74	415	421-2955	no	no	0	172.1	105	29.26	...	99	17.99	182.2	105	8.20	11.6	6	3.13	1	Retain
8	MI	112	510	420-1383	no	no	0	243.4	77	41.38	...	97	15.48	259.2	94	11.66	12.8	2	3.46	1	Retain
9	UT	68	415	403-8916	no	no	0	213.9	112	36.36	...	100	22.14	233.8	97	10.52	8.4	3	2.27	1	Churn

	index	auc	learn_rate	max_depth	model_ids	mtries	nbins	ntrees	sample_rate
0	0	0.909451	0.1	7	gbm_grid_binomial_model_4	NaN	10	40	0.8
1	1	0.909101	0.075	5	gbm_grid_binomial_model_3	NaN	20	40	0.9
2	2	0.907738	0.1	9	gbm_grid_binomial_model_1	NaN	10	40	0.9
3	3	0.906726	0.1	5	gbm_grid_binomial_model_0	NaN	20	30	0.8
4	4	0.905211	0.075	9	gbm_grid_binomial_model_2	NaN	10	20	0.8
5	5	0.904946	0.075	7	gbm_grid_binomial_model_5	NaN	20	30	0.85
6	0	0.900719	NaN	7	rf_grid_binomial_model_5	18	10	75	0.75
7	1	0.900073	NaN	7	rf_grid_binomial_model_0	15	30	75	0.75
8	2	0.898967	NaN	7	rf_grid_binomial_model_2	12	30	50	0.85
9	3	0.895928	NaN	5	rf_grid_binomial_model_3	12	20	25	0.8
10	4	0.893725	NaN	5	rf_grid_binomial_model_1	15	20	25	0.75
11	5	0.888636	NaN	5	rf_grid_binomial_model_4	15	30	25	0.9